Note: There are often multiple ways to answer each question.

Load the ggplot2 and fueleconomy packages, as well as the vehicles dataset. Run the code below to extract just the first 1,000 rows of the dataset.

library(ggplot2)
library(fueleconomy)
data(vehicles)
vehicles <- vehicles[1:2000, ]
  1. Make a scatterplot of hwy vs. cty. Give axis titles and a main title to the plot to make it more interpretable.

  2. Modify the plot above such that the color of the dot represents cyl value. Also reduce the alpha of the points to an appropriate level and introduce jitter.

  3. Modify the plot above so that each value of cyl is in its own plot.

  4. While the plot above gives us a good idea of how cars with different cyl values compare with each other, a lot of the plot space is wasted. Modify the plot so that each little plot has its own x and y scale. (Hint: This website might be helpful.)

  5. Make a barplot to show how many cars of each type of fuel there are in the dataset. (Use the geom_bar geom.) Change the theme to ggplot’s black and white theme.

  6. Make a violin plot to show the distribution of displ for each value of drive. Overlay that with a scatterplot of displ vs. drive (with jitter and alpha). How does the scatterplot give the reader more information?

  7. Make a (jittered) scatterplot of hwy against year with alpha value 0.5. Add a geom_smooth layer with option method = "lm" and without the SE bands.

  8. Modify the previous plot so that the color of the points depends on fuel. Also, change the theme to ggplot’s minimal theme and move the legend to the bottom of the plot. What happens to the geom_smooth layer?

  9. Make a (jittered) scatterplot of hwy vs. cty, with the color of the point depending on year. Change the color scale to “Spectral”. Do you see a trend?

  10. Modify the theme of the plot above to a theme you like and try a different color scale. Also, give the plot a title and make it bigger, bold and centralized.